Quickly extracting all links from a web page using the PowerShell | 您所在的位置:网站首页 › search for page › Quickly extracting all links from a web page using the PowerShell |
We are maintaining 500+ client websites in our environment. Some day before we received a request to get the list of links/Images used on each home page. We knew that it will be very tricky to get the list of links/URLs mapped in the 500+ pages and you are also aware that the annual work will not give 100% results. So we decided to use Powershell Links in the Invoke-WebRequest method to reduce manual effort. In this post, we will discuss the same with a simple example using a single URL. For checking the multiple URLs, please refer to a similar article which helps to read from excel and loop it – https://dotnet-helpers.com/powershell/powershell-script-for-website-availability-monitoring-with-excel-report-as-output PowerShell’s Invoke-WebRequest is a powerful cmdlet that allows you to download, parse, and scrape web pages. The Invoke-WebRequest cmdlet is used to download files from the web via HTTP and HTTPS. However, this cmdlet enables you to do more than download files. You can use this cmdlet for analyzing the contents of web pages. Example: Get the list of URLsThe below script will grab the innerText in addition to the corresponding links (Invoke-WebRequest -Uri “https://dotnet-helpers.com/powershell”).Links | sort-object href -Unique | Format-List innerText, href The grid view control lets you filter URLs with keyword search and you will copy the listings to the clipboard by using the Ctrl + C option. (Invoke-WebRequest -Uri “www.lantus.com”).Links.Href | Sort-Object | Get-Unique | out-gridview ![]() To fetch the list of image URLs from the page, you can run the below cmdlet (Invoke-WebRequest -Uri “https://dotnet-helpers.com”).Images | Select-Object src |
CopyRight 2018-2019 实验室设备网 版权所有 |